Goto

Collaborating Authors

 training agent


Supplementary Materials A Organization of Supplementary Materials

Neural Information Processing Systems

The supplementary materials consist of five main sections. In Appendix B, we give a detailed overview of the related literature. Proofs for Section 3. In Appendix C, we give the proofs of Theorem 1 and Proposition 1. Algorithm and Implementation Details. In Appendix D, we provide further details about the implementation and training procedure for PerSim and the RL methods we benchmark against. In Appendix E, we detail the setup used to run our experiments.


Supplementary Materials A Organization of Supplementary Materials

Neural Information Processing Systems

The supplementary materials consist of five main sections. In Appendix B, we give a detailed overview of the related literature. Proofs for Section 3. In Appendix C, we give the proofs of Theorem 1 and Proposition 1. Algorithm and Implementation Details. In Appendix D, we provide further details about the implementation and training procedure for PerSim and the RL methods we benchmark against. In Appendix E, we detail the setup used to run our experiments.


Towards Internet-Scale Training For Agents

Trabucco, Brandon, Sigurdsson, Gunnar, Piramuthu, Robinson, Salakhutdinov, Ruslan

arXiv.org Artificial Intelligence

The predominant approach for training web navigation agents gathers human demonstrations for a set of popular websites and hand-written tasks, but it is becoming clear that human data are an inefficient resource. We develop a pipeline to facilitate Internet-scale training for agents without laborious human annotations. In the first stage, an LLM generates tasks for 150k diverse websites. In the next stage, LLM agents complete tasks and produce trajectories. In the final stage, an LLM reviews the trajectories and judges their success. Language models are competitive with human annotators, detecting and filtering out harmful content with an accuracy of 97%, generating feasible tasks with an 89% rate, and judging successful trajectories with an 82.6% accuracy. Scaling the pipeline, agents based on Llama 3.1 70B solve 16.7% of tasks for 150k sites. Training on the data generated by our pipeline is competitive with training on human demonstrations. In data-limited settings derived from Mind2Web and WebLINX, we improve Step Accuracy by up to +89.5% and +122.1% respectively for agents trained on mixtures of data from our pipeline, and human data. When training agents with all available human data from these benchmarks, agents fail to generalize to diverse real sites, and adding our data improves their generalization by +149.0% for WebLINX and +156.3% for Mind2Web. Code will be available at: data-for-agents.github.io.


Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer

Yang, Yu, Xu, Pan

arXiv.org Artificial Intelligence

Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks, leveraging pre-collected datasets and Transformer's capability to model long sequences. Recent works have demonstrated that using parts of trajectories from training tasks as prompts in DT enhances its performance on unseen tasks, giving rise to Prompt-DT methods. However, collecting data from specific environments can be both costly and unsafe in many scenarios, leading to suboptimal performance and limited few-shot prompt abilities due to the data-hungry nature of Transformer-based models. Additionally, the limited datasets used in pre-training make it challenging for Prompt-DT type of methods to distinguish between various RL tasks through prompts alone. To address these challenges, we introduce the Language model-initialized Prompt Decision Transformer (LPDT), which leverages pre-trained language models for meta-RL tasks and fine-tunes the model using Low-rank Adaptation (LoRA). We further incorporate prompt regularization to effectively differentiate between tasks based on prompt feature representations. Our approach integrates pre-trained language model and RL tasks seamlessly. Extensive empirical studies demonstrate that initializing with a pre-trained language model significantly enhances the performance of Prompt-DT on unseen tasks compared to baseline methods.


Multi-Agent Training for Pommerman: Curriculum Learning and Population-based Self-Play Approach

Huynh, Nhat-Minh, Cao, Hoang-Giang, Wu, I-Chen

arXiv.org Artificial Intelligence

Pommerman is a multi-agent environment that has received considerable attention from researchers in recent years. This environment is an ideal benchmark for multi-agent training, providing a battleground for two teams with communication capabilities among allied agents. Pommerman presents significant challenges for model-free reinforcement learning due to delayed action effects, sparse rewards, and false positives, where opponent players can lose due to their own mistakes. This study introduces a system designed to train multi-agent systems to play Pommerman using a combination of curriculum learning and population-based self-play. We also tackle two challenging problems when deploying the multi-agent training system for competitive games: sparse reward and suitable matchmaking mechanism. Specifically, we propose an adaptive annealing factor based on agents' performance to adjust the dense exploration reward during training dynamically. Additionally, we implement a matchmaking mechanism utilizing the Elo rating system to pair agents effectively. Our experimental results demonstrate that our trained agent can outperform top learning agents without requiring communication among allied agents.

  Country:
  Genre: Research Report > New Finding (0.66)

Reinforcement Learning Based Self-play and State Stacking Techniques for Noisy Air Combat Environment

Tasbas, Ahmet Semih, Sahin, Safa Onur, Ure, Nazim Kemal

arXiv.org Artificial Intelligence

Reinforcement learning (RL) has recently proven itself as a powerful instrument for solving complex problems and even surpassed human performance in several challenging applications. This signifies that RL algorithms can be used in the autonomous air combat problem, which has been studied for many years. The complexity of air combat arises from aggressive close-range maneuvers and agile enemy behaviors. In addition to these complexities, there may be uncertainties in real-life scenarios due to sensor errors, which prevent estimation of the actual position of the enemy. In this case, autonomous aircraft should be successful even in the noisy environments. In this study, we developed an air combat simulation, which provides noisy observations to the agents, therefore, make the air combat problem even more challenging. Thus, we present a state stacking method for noisy RL environments as a noise reduction technique. In our extensive set of experiments, the proposed method significantly outperforms the baseline algorithms in terms of the winning ratio, where the performance improvement is even more pronounced in the high noise levels. In addition, we incorporate a self-play scheme to our training process by periodically updating the enemy with a frozen copy of the training agent. By this way, the training agent performs air combat simulations to an enemy with smarter strategies, which improves the performance and robustness of the agents. In our simulations, we demonstrate that the self-play scheme provides important performance gains compared to the classical RL training.


On Multi-Agent Learning in Team Sports Games

Zhao, Yunqi, Borovikov, Igor, Rupert, Jason, Somers, Caedmon, Beirami, Ahmad

arXiv.org Artificial Intelligence

In recent years, reinforcement learning has been successful in solving video games from Atari to Star Craft II. However, the end-to-end model-free reinforcement learning (RL) is not sample efficient and requires a significant amount of computational resources to achieve superhuman level performance. Model-free RL is also unlikely to produce human-like agents for playtesting and gameplaying AI in the development cycle of complex video games. In this paper, we present a hierarchical approach to training agents with the goal of achieving human-like style and high skill level in team sports games. While this is still work in progress, our preliminary results show that the presented approach holds promise for solving the posed multi-agent learning problem.


Winning Isn't Everything: Training Human-Like Agents for Playtesting and Game AI

Zhao, Yunqi, Borovikov, Igor, Beirami, Ahmad, Rupert, Jason, Somers, Caedmon, Harder, Jesse, Silva, Fernando de Mesentier, Kolen, John, Pinto, Jervis, Pourabolghasem, Reza, Chaput, Harold, Pestrak, James, Sardari, Mohsen, Lin, Long, Aghdaie, Navid, Zaman, Kazi

arXiv.org Artificial Intelligence

Recently, there have been several high-profile achievements of agents learning to play games against humans and beat them. We consider an alternative approach that instead addresses game design for a better player experience by training human-like game agents. Specifically, we study the problem of training game agents in service of the development processes of the game developers that design, build, and operate modern games. We highlight some of the ways in which we think intelligent agents can assist game developers to understand their games, and even to build them. Our early results using the proposed agent framework mark a few steps toward addressing the unique challenges that game developers face.